The Minimum Description Length Principle in Coding and Modeling

نویسندگان

  • Andrew R. Barron
  • Jorma Rissanen
  • Bin Yu
چکیده

We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterated logarithmic expansions of the pathwise code lengths for exponential families

Rissanen's Minimum Description Length (MDL) principle is a statistical modeling principle motivated by coding theory. For exponential families we obtain pathwise expansions, to the constant order, of the predictive and mixture code lengths used in MDL. The results are useful for understanding diierent MDL forms.

متن کامل

On the minimum description length principle for sources with piecewise constant parameters

Universal lossless coding in the presence of finitely many abrupt changes in the statistics of the source, at unknown points, is investigated. The minimum description length (MDL) principle is derived for this setting. In particular, it is shown that for any uniquely decipherable code, for almost every combination of statistical parameter vectors governing each segment, and for almost every vec...

متن کامل

Adaptive partially hidden Markov models with application to bilevel image coding

Partially hidden Markov models (PHMMs) have previously been introduced. The transition and emission/output probabilities from hidden states, as known from the HMMs, are conditioned on the past. This way, the HMM may be applied to images introducing the dependencies of the second dimension by conditioning. In this paper, the PHMM is extended to multiple sequences with a multiple token version an...

متن کامل

Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity

The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles minimum description length (MDL) and minimum message length (MML), abstracted as the ideal MDL principle and defined from Bayes’s rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be app...

متن کامل

Chapter 7 Asymptotics and Coding Theory : One of the n ! • Dimensions of Terry

Terry joined the Berkeley Statistics faculty in the summer of 1987 after being the statistics head of CSIRO in Australia. His office was just down the hallway from mine on the third floor of Evans. I was beginning my third year at Berkeley then and I remember talking to him in the hallway after a talk that he gave on information theory and the Minimum Description Length (MDL) Principle of Rissa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Information Theory

دوره 44  شماره 

صفحات  -

تاریخ انتشار 1998